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Jay Ayres, Jason Flannick, Johannes Gehrke, Tomi Yiu 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 



Full text available: "g| pdf(639.47 KB) 



Additional Information: full citation , abstract , references , citings , index 
terms 



We introduce a new algorithm for mining sequential patterns. Our algorithm is especially 
efficient when the sequential patterns in the database are very long. We introduce a novel 
depth-first search strategy that integrates a depth-first traversal of the search space with 
effective pruning mechanisms. Our implementation of the search strategy combines a 
vertical bitmap representation of the database with efficient support counting. A salient 
feature of our algorithm is that it incremental! ... 

DB-3 (databases): data mining: Scalable sequential pattern mining for biological 
s equences 

Ke Wang, Yabo Xu, Jeffrey Xu Yu 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: ^ pdf(237.Q1 KB) 



Biosequences typically have a small alphabet, a long length, and patterns containing gaps 
(i.e., "don't care") of arbitrary size. Mining frequent patterns in such sequences faces a 
different type of explosion than in transaction sequences primarily motivated in market- 
basket analysis. In this paper, we study how this explosion affects the classic sequential 
pattern mining, and present a scalable two-phase algorithm to deal with this new 
explosion. The <i>Segment Phase</i> first search ... 

Keywords: algorithm, bioinformatics, frequent pattern, pruning technique, sequence, 
sequential pattern 
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Hong Cheng, Xifeng Yan, Jiawei Han 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: |p pdf(1 50.79 KB) Additional Information: full citation , abstract , references , index terms 

Many real life sequence databases grow incrementally. It is undesirable to mine 
sequential patterns from scratch each time when a small set of sequences grow, or when 
some new sequences are added into the database. Incremental algorithm should be 
developed for sequential pattern mining so that mining can be adapted to incremental 
database updates. However, it is nontrivial to mine sequential patterns incrementally, 
especially when the existing sequences grow incrementally because such growth may ... 

Keywords: buffering pattern, incremental mining, reverse pattern matching, shared 
projection 



Minin g block correlations to improve storage performance 
Zhenmin Li, Zhifeng Chen, Yuanyuan Zhou 

May 2005 ACM Transactions on Storage (TOS), Volume 1 issue 2 
Publisher: ACM Press 

Full text available: ^| pdf(102 MB) Additional Information: full citation , abstract , references , index terms 

Block correlations are common semantic patterns in storage systems. They can be 
exploited for improving the effectiveness of storage caching, prefetching, data layout, and 
disk scheduling. Unfortunately, information about block correlations is unavailable at the 
storage system level. Previous approaches for discovering file correlations in file systems 
do not scale well enough for discovering block correlations in storage systems. In this 
article, we propose two algorithms, C-Miner and ... 

Keywords: Storage management, block correlations, file system management, mining 
methods and algorithms 
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Guimei Liu, Hongjun Lu, Wenwu Lou, Jeffrey Xu Yu 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

_ .. , , « A* ^ ^c^ti cc i^m Additional Information: full citation , abstract, references, citings, index 

Full text available: g pdfd 95.56 KB) \&ixm 

Extensive efforts have been devoted to developing efficient algorithms for mining frequent 
patterns. However, frequent pattern mining remains a time-consuming process, especially 
for very large datasets. It is therefore desirable to adopt a "mining once and using many 
times" strategy. Unfortunately, there has been little work reported on managing and 
organizing a large set of patterns for future use. In this paper, we propose a disk-based 
data structure, CFP-tree (Condensed Frequent Pattern Tree ... 

Keywords: data mining and data warehousing, frequent pattern mining 
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August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 




http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=70661397&CFTOKEN 3/6/2006 



Results (page 1): sequential pattern mining using a bitmap representation 



Page 3 of 7 



Full text available: ^pdf(228.14 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

A number of vertical mining algorithms have been proposed recently for association 
mining, which have shown to be very effective and usually outperform horizontal 
approaches. The main advantage of the vertical format is support for fast frequency 
counting via intersection operations on transaction ids (tids) and automatic pruning of 
irrelevant data. The main problem with these approaches is when intermediate results of 
vertical tid lists become too large for memory, thus affecting the algorithm ... 

Keywords: association rule mining, diffsets, frequent itemsets 



7 Data mining (DM): A new algorithm for gap constrained sequence mining B 
£^ Salvatore Orlando, Raffaele Perego, Claudio Silvestri 

^ March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Publisher: ACM Press 

Full text available: | | pdf(287.50 KB) Additional Information: full citation , abstrac t, references, index terms 

The sequence mining problem consists in finding frequent sequential patterns in a 
database of time-stamped events. Several application domains require limiting the 
maximum temporal gap between events occurring in the input sequences. However 
pushing down such constraint is critical for most sequence mining algorithms. In this paper 
we describe CCSM (Cache-based Constrained Sequence Miner), a new level-wise 
algorithm that overcomes the troubles usually related to this kind of constraints. CCSM 
ado ... 

Keywords: constraints, intersection, sequence mining, sequential patterns, vertical 
dataset 



8 Web services and performance evaluation: Indexing web access-logs for pattern Q 
queries 

Alexandras Nanopoulos, Yannis Manolopoulos, Maciej Zakrzewicz, Tadeusz Morzy 
November 2002 Proceedings of the 4th international workshop on Web information 

and data management 
Publisher: ACM Press 

Full text available: Q pdfd 87.24 KB) Additional Information: full citation , abstract , references, iMex tenns 

In this paper, we develop a new indexing method for large web access-logs. We are 
concerned with pattern queries, which advocate the search for access sequences that 
contain certain query patterns. This kind of queries find applications in processing web-log 
mining results (e.g., finding typical/atypical access-sequences). The proposed method 
focuses on scalability to web-logs' sizes. For this reason, we examine the gains due to 
signature-trees, which can further improve the scalability to very ... 

9 Us er trails: Improving adaptation in web-based educational hypermedia by means of Q 

knowledge discovery 
Andrej Kristofic, Maria Bielikova 

September 2005 Proceedings of the sixteenth ACM conference on Hypertext and 
hypermedia HYPERTEXT ■OS 

Publisher: ACM Press 

Full text available: g pdf(519.13 KB) Additional Information: full citation , abstract , references , index terms 

Most adaptive web-based hypermedia systems adapt presentation of the content and/or 
navigation using predefined set of rules. Considering different behavior and preferences of 
each user it may be hard to generalize and construct all appropriate rules in advance. This 
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problem is more noticeable in educational adaptive hypermedia systems, where 
adaptation to individual learning style of a student is important for the student to 
effectively assess particular domain. In this paper we present techniq ... 

Keywords: adaptive navigation, adaptive web-based educational hypermedia, concept 
recommendation, knowledge discovery, usage patterns 



10 Fr equent patterns II: Minin g frequent item sets by o p portunistic projection 
Junqiang Liu, Yunhe Pan, Ke Wang, Jiawei Han 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

r- „ t u , a*, ^aoaA nn i/d^ Additional Information: full citation , abstract , references , citings , index 

Full text available: ^ pdf(864.90 KB) 

terms 

In this paper, we present a novel algorithm Opportune Project for mining complete set of 
frequent item sets by projecting databases to grow a frequent item set tree. Our 
algorithm is fundamentally different from those proposed in the past in that it 
opportunistically chooses between two different structures, array-based or tree-based, to 
represent projected transaction subsets, and heuristically decides to build unfiltered 
pseudo projection or to make a filtered copy according to features of the ... 

Keywords: association rules, frequent patterns 



11 Computing curricula 2001 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Publisher: ACM Press 

Full text available: g pdf(613.63 KB) Additjona , information: full citation , references , citings, index terms 
[ff] html(2.78 KB) 



12 High pe rformance data mining (tutorial PM-3) H 
Vlpin Kumar, Mohammed Zaki 

August 2000 Tutorial notes of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: I pl pdf(8.06 MB) Additional Information: full citation , references , index terms 




13 Discovering all most specific sentences 

Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, Ram 
Sewak Sharma 

June 2003 ACM Transactions on Database Systems (TODS), Volume 28 issue 2 
Publisher: ACM Press 

„ , t .. . . 0 .r/ooo i/d\ Additional Information: full citation , abstract , references , citing s, index 
Full text available: ^|] pdf(283.09 KB) terms 

Data mining can be viewed, in many instances, as the task of computing a representation 
of a theory of a model or a database, in particular by finding a set of maximally specific 
sentences satisfying some property. We prove some hardness results that rule out simple 
approaches to solving the problem.The a priori algorithm is an algorithm that has been 
successfully applied to many instances of the problem. We analyze this algorithm, and 
prove that is optimal when the maximally specific sen ... 
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Keywords: Data mining, association rules, learning with membership queries, maximal 
frequent sets, minimal keys 



14 Research sessions: implementation techniques: Implementing database operations j 
using SIMP instructions 

^ Jingren Zhou, Kenneth A. Ross 

June 2002 Proceedings of the 2002 ACM SIGMOD international conference on 

Management of data SIGMOD '02 
Publisher: ACM Press 

_ ii , , ui « _ix/« o ft Kiiox Additional Information: full citation , abstract , references , citings , index 

Full text available: 7£\ pdf(1.39 MB) 

terms 

Modern CPUs have instructions that allow basic operations to be performed on several 
data elements in parallel. These instructions are called SIMD instructions, since they apply 
a single instruction to multiple data elements. SIMD technology was initially built into 
commodity processors in order to accelerate the performance of multimedia applications. 
SIMD instructions provide new opportunities for database engine design and 
implementation. We study various kinds of operations in a database con ... 

15 Frequent patterns I: DualMiner: a dual-pruning algorithm for itemsets with constraints | 
Cristian Bucila, Johannes Gehrke, Daniel Kifer, Walker White 

v July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 
Publisher: ACM Press 

c- .. . * .. ul £1 00 ym Additional Information: full citation , abstract , references , citings, index 
Full text available: ^ pdfd .22 MB) terms 

Constraint-based mining of itemsets for questions such as "find all frequent itemsets 
where the total price is at least $50" has received much attention recently. Two classes of 
constraints, monotone and antimonotone, have been identified as very useful. There are 
algorithms that efficiently take advantage of either one of these two classes, but no 
previous algorithms can efficiently handle both types of constraints simultaneously. In this 
paper, we present the first algorithm (called Dua ... 

16 Compiler-based I/O prefetching for out-of-core applications 
Angela Demke Brown, Todd C. Mowry, Orran Krieger 

May 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 2 
Publisher: ACM Press 

i- .. * x •. ui 0 , {/ ,nn no ./m Additional Information: full citation , abstract , references , citings , index 

Full text available: IS p df( 4 99.Q3 KB) - 

^ terms , review 

Current operating systems offer poor performance when a numeric application's working 
set does not fit in main memory. As a result, programmers who wish to solve "out-of- 
core" problems efficiently are typically faced with the onerous task of rewriting an 
application to use explicit I/O operations (e.g., read/write). In this paper, we propose and 
evaluate a fully automatic technique which liberates the programmer from this task, 
provides high performance, and requires only minima ... 

Keywords: compiler optimization, prefetching, virtual memory 



17 MIL primitives for querying a fragmented world 
Peter A. Boncz, Martin L. Kersten 

October 1999 The VLDB Journal — The International Journal on Very Large Data 
Bases, Volume 8 Issue 2 
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Publisher: Springer-Verlag New York, Inc. 

Full text available: ^| pdf(261.36 KB) Additional Information: full citation , abstract , citings , index terms 

In query-intensive database application areas, like decision support and data mining, 
systems that use vertical fragmentation have a significant performance advantage. In 
order to support relational or object oriented applications on top of such a fragmented 
data model, a flexible yet powerful intermediate language is needed. This problem has 
been successfully tackled in Monet, a modern extensible database kernel developed by 
our group. We focus on the design choices made in the Monet interprete ... 

Keywords: Database systems, Main-memory techniques, Query languages, Query 
optimization, Vertical fragmentation 



18 Enabling scalable online personalization on the Web B 
||v Debra VanderMeer, Kaushik Dutta, Anindya Datta, Krithi Ramamritham, Shamkant B. 
^ Navanthe 

October 2000 Proceedings of the 2nd ACM conference on Electronic commerce 
Publisher: ACM Press 

Full text available: 1p| pdf(491.69 KB) Additional Information: full citation , references , citings , index terms 



Keywords: dynamic profiling, e-commerce, online personalization, user behavior 



19 An architecture to support scalable online personalization on the Web 

Anindya Datta, Kaushik Dutta, Debra VanderMeer, Krithi Ramamritham, Shamkant B. 
Navathe 

August 2001 The VLDB Journal - The International Journal on Very Large Data 

Bases, Volume 10 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: || pdf(167.25 KB) Additional Information: full citation , abstract , citings, index terms 

Online personalization is of great interest to e-companies. Virtually all personalization 
technologies are based on the idea of storing as much historical customer session data as 
possible, and then querying the data store as customers navigate through a web site. The 
holy grail of online personalization is an environment where fine-grained, detailed 
historical session data can be queried based on current online navigation patterns for use 
in formulating real-time responses. Unfortunately, as mo ... 

Keywords: Behavior-based personalization, Dynamic lookahead profile, Profile caching, 
Scalable online personalization, Web site and interaction model 



20 Bioinformatics (BIO): Incremental interactive mining of constrained association rules Q 
j% from biological annotation data with nominal features 

^ Imad Rahal, Dongmei Ren, Amal Perera, Hassan Najadat, William Perrizo, Riad Rahhal, Willy 
Valdivia 

March 2005 Proceedings of the 2005 ACM symposium on Applied computing SAC '05 
Publisher: ACM Press 

Full text available: |gpdf(1 32.24 KB) Additional Information: full citation , abstract , references , index terms 

Data arising from genomic and proteomic experiments is amassing at high speeds 
resulting in huge amounts of raw data; consequently, the need for analyzing such 
biological data — the understanding of which is still lagging way behind — has been 
prominently solicited in the post-genomic era we are currently witnessing. In this paper 
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we attempt to analyze annotated genome data by applying a very central data-mining 
technique known as association rule mining with the aim of discovering rules cap ... 

Keywords: P-trees, association rule mining, bioinformatics, incremental, interactive, 
yeast genome 
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